Toward a Pan-Chinese Thesaurus

نویسندگان

  • Benjamin Ka-Yin T'sou
  • Oi Yee Kwong
چکیده

In this paper, we propose a corpus-based approach to the construction of a Pan-Chinese lexical resource, starting out with the aim to enrich existing Chinese thesauri in the Pan-Chinese context. The resulting thesaurus is thus expected to contain not only the core senses and usages of Chinese lexical items but also usages specific to individual Chinese speech communities. We introduce the rationale underlying the construction of the resource, outline the steps to be taken, and discuss some preliminary analyses. The work is backed up by a unique and large Chinese synchronous corpus containing textual data from various Chinese speech communities including Hong Kong, Beijing, Taipei and Singapore.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extending a Thesaurus with Words from Pan-Chinese Sources

In this paper, we work on extending a Chinese thesaurus with words distinctly used in various Chinese communities. The acquisition and classification of such region-specific lexical items is an important step toward the larger goal of constructing a Pan-Chinese lexical resource. In particular, we extend a previous study in three respects: (1) to improve automatic classification by removing dupl...

متن کامل

Extending a Thesaurus in the Pan-Chinese Context

In this paper, we address a unique problem in Chinese language processing and report on our study on extending a Chinese thesaurus with region-specific words, mostly from the financial domain, from various Chinese speech communities. With the larger goal of automatically constructing a Pan-Chinese lexical resource, this work aims at taking an existing semantic classificatory structure as levera...

متن کامل

A Comprehensive Chinese Thesaurus System and its Weighting Scheme

Semantic/conceptual knowledge can greatly help in the processing of Chinese information. A well designed thesaurus can comprehensively reveal various semantic relationships among diierent elements in the documents, thus serving as a critical tool in intelligent Chinese information processing system. In this research, we have designed a comprehensive Chinese thesaurus system which can be used in...

متن کامل

Combining a Chinese Thesaurus with a Chinese Dictionary

Abs t rac t In this paper, we study the problem of combining a Chinese thesaurus with a Chinese dictionary by linking the word entries in the thesaurus with the word senses in the dictionary, and propose a similar word strategy to solve the problem. The method is based on the definitions given in the dictionary, but without any syntactic parsing or sense disambiguation on them at all. As a resu...

متن کامل

An Automatic Segmentation Method Combined with Length Descending and String Frequency Statistics for Chinese Text1

Put forward a new method about automatic Chinese text segmentation based on Chinese characters string (CCS) frequency and length descending. It can automatically segment meaningful CCS in text based on processing longer string first and string frequency information, with no thesaurus, no acquiring the probability between words in advance and no Chinese character index. This method can effective...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006